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Abstract 

We consider the problem of portfolio selection within the classical Markowitz mean-variance framework, 
reformulated as a constrained least-squares regression problem. We propose to add to the objective function 
a penalty proportional to the sum of the absolute values of the portfolio weights. This penalty regularizes 
(stabilizes) the optimization problem, encourages sparse portfolios (i.e. portfolios with only few active 
positions), and allows to account for transaction costs. Our approach recovers as special cases the no-short- 
Qh ' positions portfolios, but does allow for short positions in limited number. We implement this methodology 

on two benchmark data sets constructed by Fama and French. Using only a modest amount of training data, 
we construct portfolios whose out-of-sample performance, as measured by Sharpe ratio, is consistently and 
significantly better than that of the naive evenly-weighted portfolio which constitutes, as shown in recent 
O" 1 literature, a very tough benchmark. 

Keywords: Portfolio Choice, Sparsity, Penalized Regression 

^j- ; 1 Introduction 
O 

In 1951, Harry Markowitz ushered in the modern era of portfolio theory by applying simple mathematical ideas 
qq , to the problem of formulating optimal investment portfolios (Markowitz, 1952). Single minded pursuit of high 

returns constitutes a poor strategy Markowitz argued. Instead, he suggested, rational investors must balance 
["~" , a desire for high returns with a desire for low risk, as measured by variability of returns. 

It is not trivial, however, to translate Markowitz's conceptual framework into a satisfactory portfolio selection 
algorithm in a real-world context. Indeed, in a recent survey, DeMiguel, Garlappi, and Uppal (2007) examined 
several portfolio construction algorithms inspired by the Markowitz framework. The authors found that, given 
a reasonable amount of training data, none of the surveyed algorithms is able to significantly or consistently 
outperform the naive strategy in which each available asset is given an equal weight in the portfolio. This 
disappointing algorithm performance is likely due, at least in part, to the structure of the Markowitz optimization 
framework as originally proposed. Specifically, the optimization at the core of the Markowitz scheme is, as 
originally formulated, empirically unstable: small changes in assumed asset returns, volatilities or correlations 
can have large effects on the output of the optimization procedure. In this sense, the classic Markowitz portfolio 
optimization can be viewed as an ill-posed (or ill-conditioned) inverse problem. Such problems are frequently 
encountered in many other fields, where a variety of regularization procedures have been proposed to tame the 
troublesome instabilities (Bertero and Boccacci, 1998). 

In this paper, we discuss a regularization of Markowitz's portfolio construction. We shall restrict ourselves 
to the traditional Markowitz mean- variance approach 1 ; moreover, we focus on one particular regularization 
method, and highlight some very special properties of the regularized portfolios obtained through its use. 

Our proposal consists of adding an l\ penalty to the Markowitz objective function. We allow ourselves to 
adjust the importance of this penalty with a "tunable" coefficient. For large values of this coefficient, optimiza- 
tion of the penalized objective function turns out to be equivalent to solving the original (unpenalized) problem 



1 Similar ideas could also be applied to different portfolio construction frameworks considered in the literature. 
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under an additional positivity condition on the weights. As the tunable coefficient is decreased, the optimal 
solutions are given more and more latitude to include short positions. The optimal solutions for our penalized 
objective function can thus be seen as natural generalizations of the "no-short-positions" portfolios considered 
by Jagannathan and Ma (2003). In addition to stabilizing the optimization problem (Daubechies, Defrise, and 
De Mol, 2004) and generalizing no-short-positions-constrained optimization, the l\ penalty facilitates treatment 
of transaction costs. For large investors, whose principal cost is a fixed bid-ask spread, transaction costs are 
effectively proportional to the gross market value of the selected portfolio, and thus to our l\ penalty term. 
For small investors, volume independent "overhead" costs cannot be ignored, and thus transaction costs are 
best modeled via a combination of an l\ penalty term and the number of assets transacted; minimizing such a 
combination is tantamount to searching for sparse solutions (sparse portfolios or sparse changes to portfolios), 
a goal that we shall see is also achieved by our use of an l\ penalty term. 2 

We implement the methodology and compute efficient investment portfolios using as our assets two sets of 
portfolios constructed by Fama and French: the 48 industry portfolios and the 100 portfolios formed on size 
and book-to-market. Using data from 1973 to 2006, we construct an ensemble of portfolios for various values of 
our tunable coefficient and track their out-of-sample performances. We find a consistent and significant increase 
in Sharpe ratio compared to the naive equal-weighting strategy. When using the 48 industry portfolios as our 
assets, we find that the best portfolios we construct have no short positions. When our assets are the 100 
portfolios formed on size and book-to-market, we find that the best portfolios constructed by our methodology 
do include short positions. 

We are not alone in proposing the use of regularization in the context of Markowitz-inspired portfolio 
construction; DeMiguel, Garlappi, Nogales, and Uppal (2007) discuss several different regularization techniques 
for the portfolio construction problem, including the imposition of constraints on appropriate norms of the 
portfolio weight vector. Our work 3 differs from DeMiguel, Garlappi, Nogales, and Uppal (2007) in that our goal 
is not only regularization: we are interested in particular in the stable construction of sparse portfolios, which 
is achieved by l\ penalization, as demonstrated by our analysis and examples. 

The organization of our paper is as follows. In the next Section, we formulate the problem of portfolio selec- 
tion and we describe our methodology based on £i-penalized least-squares regression. In Section 3, we describe 
key mathematical properties of the portfolios we construct and devise an efficient algorithm for computing them. 
In Section 4, we present empirical results consisting of an out-of-sample performance evaluation of our sparse 
portfolios. In Section 5, we discuss possible extensions of our methodology to other portfolio construction prob- 
lems which can be naturally translated into optimizations involving l\ penalties. Finally, Section 6 summarizes 
our findings. 

2 Sparse portfolio construction 

We consider N securities and denote their returns at time t by n t t, i = 1, . . . ,N. We write r t = (ri it , ■ ■ • , ^7V,t) T 
for the N x 1 vector of returns at time t. We assume that the returns are stationary and write E[r t ] = /x for 
the vector of expected returns of the different assets, and E[(r t — /z)(r t — /x) T ] = C for the covariance matrix of 
returns. 

A portfolio is defined to be a list of weights Wi for assets i = 1, . . . , N that represent the amount of capital 
to be invested in each asset. We assume that one unit of capital is available and require that capital to be fully 
invested. Thus we must respect the constraint that J2iLi Wi = 1. We collect the weights in an N x 1 vector 
w = (ioi, . . . , wn) t ■ The normalization constraint on the weights can thus be rewritten as w t 1at = 1, where 
ljv denotes the N x 1 vector in which every entry is equal to 1. For a given portfolio w, the expected return 
and variance are equal to w T /x and w T Cw, respectively. 

In the traditional Markowitz portfolio optimization, the objective is to find a portfolio which has minimal 

2 A sparse portfolio is a portfolio with few active positions, i.e. few non-zero weights. 

3 The first presentation of our work is given in Brodie, Daubechies, Dc Mol, and Giannonc (2007), independent of and simultaneous 
with DeMiguel, Garlappi, Nogales, and Uppal (2007). 
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variance for a given expected return p = w T /x. More precisely, one seeks w such that: 

w = argminw T Cw 

w 

s. t. w T /x = p 

w t 1at = 1. 

Since C = E[r t rJ] — /z/z T , the minimization problem is equivalent to 

w = argminE \\p — w T r t | 2 l 

w 

s. t. w T /i = p 

w t 1at = 1- 

For the empirical implementation, we replace expectations with sample averages. We set /x — ^ Ylt=i r * an d 
define R to be the T x N matrix of which row t is given by rj, that is, R t .i — (r t ) i = r itt . Given this notation, 
we thus seek to solve the following optimization problem 

w = argmin^||pl T — -Rw||| (1) 

w / 

s. t. w T /2 = p 

w t 1at = 1, 

where, for a vector a in R T , we denote by ||a||2 the sum Y^t=i a t • 

This problem requires the solution of a constrained multivariate regression involving many potentially 
collinear variables. While this problem is analytically simple, it can be quite challenging in practice, depending 
on the nature of the matrix R. Specifically, the condition number — defined to be the ratio of the largest to 
smallest singular values of a matrix — of R can effectively summarize the difficulty we will face when trying to 
perform this optimization in a stable way. When the condition number of R is small, the problem is numerically 
stable and easy to solve. However, when the condition number is large, a non-regularized numerical procedure 
will amplify the effects of noise anisotropically, leading to an unstable and unreliable estimate of the vector w. 
As asset returns tend to be highly correlated, the smallest singular value of R can be quite small, leading to 
a very large condition number and thus very unstable optimizations in a financial context, ft is this sort of 
instability that likely plagues many of the algorithms reviewed by DeMiguel, Garlappi, and Uppal (2007) and 
renders them unable to outperform the naive portfolio. 

To obtain meaningful, stable results for such ill-conditioned problems, one typically adopts a regularization 
procedure. One fairly standard approach is to augment the objective function of interest with a penalty term, 
which can take many forms and ideally should have a meaningful interpretation in terms of the specific problem 
at hand. We propose here to add a so-called l\ penalty to the original Markowitz objective function (1). We 
thus seek to find a vector of portfolio weights w that minimizes 

wH = argmin [||pl T — -Rw||| +r||w||il (2) 

s. t. w T j£ = p (3) 
w T l w = 1. (4) 

Here the l\ norm of a vector w in R N is defined by ||w||i := J2iLi \ w i\ i an d t is a parameter that allows us 
to adjust the relative importance of the l\ penalization in our optimization. Note that we absorbed the factor 
1/T from (1) in the parameter r. The particular problem of minimizing an (unconstrained) objective function 
of the type (2) was named lasso regression by Tibshirani (1996). 

Adding an l\ penalty to the objective function (1) has several useful consequences: 

• ft promotes sparsity. That penalizing or minimizing t\ norms can have a sparsifying effect has long 
been observed in statistics (see e.g. Chen, Donoho, and Saunders (2001) and the references therein). 
Minimization of l\ penalized objective functions is now a widely used technique when sparse solutions are 
desirable. Sparsity should also play a key role in the task of formulating investment portfolios. Indeed, 
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(a) Very small l\ (b) Small l\ penalty: (c) Moderately sized (d) Large l\ penalty: 
penalty: tangency on tangency on edge, t\ penalty: tangency tangcncy remains at 
edge. nearing vertex. reaches vertex. vertex, moves toward 

origin along axis. 

Figure 1: £i penalties promote sparsity: a geometric illustration in 2 dimensions. In N dimensions, the level sets of the l\ norm 
are so-called cross polytopes , while the level sets of the volatility term (1) are typically ellipsoids. In 2 dimensions, we illustrate 
these as diamonds and circles respectively. The minimizer of (2) must be located at a point where the level sets of the l\ term 
and the volatility term are tangent. There is a continuous family of such points, and the solution to a particular minimization will 
be determined by the relative sizes of the terms in the objective function (2). When the i\ norm is given a small weight in the 
objective function, optimal solutions lie on level sets of large i\ norm. These large l\ level sets intersect the volatility level sets at 
generic points where many, if not all, of the entries in w differ from zero. However, as the weight on the i\ term is increased, the 
tangency moves onto smaller level sets of the l\ norm. Since the l\ ball is "pointy," the tangencies move toward the corners of the 
i\ level sets where more and more entries in w are equal to 0. Moreover, as Figure 2 shows, any l p penalty with SC p $C 1 will 
lead to "pointy" level sets that give rise to a similar sparsification effect. 

investors frequently want to be able to limit the number of positions they must create, monitor and 
liquidate. By considering suitably large values of r in (2), one can achieve just such an effect within 
our framework. Figure 1 illustrates geometrically how the addition of an l\ term to the unconstrained 
volatility minimization encourages sparse solutions. 

• It regulates the amount of shorting in the portfolio designed by the optimization process. Because of the 
constraint (4), an equivalent form of the objective function in (2) is 

||plT-^w||2 + 2r J2 \»>i\+T, (5) 
i with Wi<o 

in which the last term is of course irrelevant for the optimization process. Under the constraint (4), the 
l\ penalty is thus equivalent to a penalty on short positions. 

The no-short-positions optimal portfolio, obtained by minimizing (1) under the three constraints given by 
not only (3) and (4) but also the additional restriction Wi ^ for i — 1,...,N, is in fact the optimal 
portfolio for (5) in the limit of extremely large values of r. As the high r limit of our framework, it is 
completely natural that the positive solution should be quite sparse; this sparsity of optimal no-short- 
positions portfolios can indeed also be observed in practice. (See Section 4.) We note that the literature 
has focused on the stability of positive solutions, but seems to have overlooked the sparsity of such 
solutions. This may be due, possibly, to the use of iterative numerical optimization algorithms and a 
stopping criterion that halted the optimization before most of the components had converged to their zero 
limit. By decreasing r in the ^-penalized objective function to be optimized, one relaxes the constraint 
without removing it completely; it then no longer imposes positivity absolutely, but still penalizes overly 
large negative weights. 

• It stabilizes the problem. By imposing a penalty on the size of the coefficients of w in an appropriate 
way, we reduce the sensitivity of the optimization to the possible collinearities between the assets. In 
Daubechies, Defrise, and De Mol (2004), it is proved (for the unconstrained case) that any £ p penalty 
on w, with 1 ^ p ^ 2, suffices to stabilize the minimization of (1) by regularizing the inverse problem. 
The stability induced by the t\ penalization is extremely important; indeed, it is such stability property 
that makes practical, empirical work possible with only limited training data. For example, De Mol, 
Giannone, and Reichlin (2006) show that this regularization method can be used to produce accurate 
macroeconomic forecasts using many predictors. Figure 2 shows the geometric impact of an £ p penalty for 
the unconstrained problem for various values of p. It also illustrates why only l v penalties with ^ p ^ 1 
are able to promote sparsity. 
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Figure 2: A geometric look at regularization penalties. The panels above depict sets of fixed t v norm for various values of p (from 
left to right, p = 1/2, 1, 2, 3), in 3 dimensions. For any fixed p, adding an £ p penalty to a given minimization encourages solutions 
to stay within regions around the origin defined by scaled versions of the l v ball. For p ^ 1 the amended minimization problem 
remains convex, and thus algorithmically more tractable; on the other hand £ p penalties with ^ p ^ 1 encourage sparsity (see 
Figure 1). In our optimization, we focus on the case p = 1, which has both desirable features. 

• It makes it possible to account for transaction costs in a natural way. In addition to the choice of the 
securities they trade, real-world investors must also concern themselves with the transaction costs they 
will incur when acquiring and liquidating the positions they select. Transaction costs in a liquid market 
can be modeled by a two-component structure: one that is a fixed "overhead" , independent of the size of 
the transaction, and a second one, given by multiplying the transacted amount with the market-maker's 
bid-ask spread applicable to the size of the transaction. 

For large investors, the overhead portion can be neglected; in that context, the total transaction cost paid 
is just Yli=i s i \ w i\j the sum of the products of the absolute trading volumes \iVi\ and bid-ask spreads Si 
for the securities i = l,...,N. We assume that the bid- ask spread is the same for all assets and constant 
for a wide range of transaction sizes. In that case, the transaction cost is then effectively captured by an 
£i penalty 4 . 

For small investors, the overhead portion of the transaction costs is non negligible. In the case of a 
very small investor, this portion may even be the only one worth considering; if the transaction costs 
are asset-independent, then the total cost is simply proportional to the number K of assets selected (i.e. 
corresponding to non-zero weights), a number sometimes referred to as ||w||o, the £q norm of the weight 
vector. Like an £\ sum, this £q sum can be incorporated into the objective function to be minimized; 
however, ^-penalized optimization is computationally intractable when more than a handful of variables 
are involved because its complexity is essentially combinatorial in nature, and grows super-exponentially 
with the number of variables. For this reason, one often replaces the £q penalty, when it occurs, by its 
much more tractable (convex) ^i-penalty cousin, which has similar sparsity-promoting properties. In this 
sense, our £\ penalization is thus "natural" even for small investors. 

3 Optimization strategy 

We first quickly review the unconstrained case, i.e. the minimization of the objective function (2), and then 
discuss how to deal with the constraints (3) and (4). 

Various algorithms can be used to solve (2). For the values of the parameters encountered in the portfolio 
construction problem, a particularly convenient algorithm is given by the homotopy method (Osborne, Pres- 
nell, and Turlach, 2000a, b), also known as Least Angle Regression or LARS (Efron, Hastie, Johnstone, and 
Tibshirani, 2004). This algorithm seeks to solve (2) for a range of values of r, starting from a very large value, 
and gradually decreasing r until the desired value is attained. As r evolves, the optimal solution w^l moves 
through M. N , on a piecewise affine path. As such, to find the whole locus of solutions for w^l we need only find 
the critical points where the slope changes. These slopes are thus the only quantities that need to be computed 
explicitly, besides the breakpoints of the piecewise linear (vector-valued) function. For every value of r, the 
entries j for which Wj ^ 0, are said to constitute the active set A T . Typically, the number of elements of A T 
increases as r decreases. However, this is not always the case: at some breakpoints entries may need to be 
removed from A T ] see e.g. (Efron, Hastie, Johnstone, and Tibshirani, 2004). 

4 Our methodology can be easily generalized to asset-dependent bid-ask spreads — see Section 5. 
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When the desired minimizer contains only a small number K of non-zero entries, this method is very fast. 
At each breakpoint, the procedure involves solving a linear system of k equations with k unknowns, k being 
the number of active variables, which increases until K is reached. This imposes a pragmatic feasibility upper 
bound on K. 

The homotopy/LARS algorithm applies to the unconstrained ^-penalized regression. The problem of interest 
to us, however, is the minimization of (2) under the constraints (3, 4), in which case the original algorithm does 
not apply. In the Appendix, we show how the homotopy/LARS algorithm can be adapted to deal with a general 
^i-penalized minimization problem with linear constraints, allowing us to find: 

w [Tl =argmin [||y - flw||i + r||w||i] (6) 

where H is a prescribed affine subspace, defined by the linear constraints. The adapted algorithm consists again 
of starting with large values of r, and shrinking r gradually until the desired value is reached, monitoring the 
solution, which is still piecewise linear, and solving a linear system at every breakpoint in r. Because of the 
constraints, the initial solution (for large values of r) is now more complex (in the unconstrained case, it is 
simply equal to zero); in addition, extra variables (Lagrange multipliers) have to be introduced that are likewise 
piecewise linear in r, and the slopes of which have to be recomputed at every breakpoint. 

In the particular case of the minimization (2) under the constraints (3), (4), an interesting interplay takes 
place between (4) and the £i-penalty term. When the weights wi are all non-negative, the constraint (4) is 
equivalent to setting ||w||i = 1. Given that the l\— penalty term takes on a fixed value in this case, minimizing 
the quadratic term only (as in (1)) is thus equivalent to minimizing the penalized objective function in (2), for 
non-negative weights tOj. This is consistent with the observation made by Jagannathan and Ma (2003) that a 
restriction to non-negative-weights-only can have a regularizing effect on Markowitz's portfolio construction. 

The following mathematical observations have interesting consequences. Suppose that the two weight vectors 
wI Tl l and are minimizers of (2), corresponding to the values n and T2 respectively, and both satisfy the 

two constraints (3), (4). By using the respective minimization properties of wI Tl l and w^ T2 \ we obtain 

||pl T -flw [Tll ||^ + Tl||w [TlI ||l < \\pl T -RMV lT2] 



+ {n 

• i>\ r Kw 

+ (n 

= \\pl T -Rw [Tl] 
+ ( n - r 2 ) ( 



2 + n||w™| 



2 + T 2 \\W [T2] \\1 
-T 2 )||W^|| 1 
2 + TSHW^HI 
-T- 2 ) Hw^Hl 

2 + n||w™|| 1 

w^lh - ||w [ril | 



which implies that 

(ri -r 2 ) (llw^Hx - llw^lli) > 0. (7) 

If some of the u>[ T2 ' are negative, but all the entries in w^l arc positive or zero, then we have ||w[ T2 l||i > 
J2iLi W ! T2 ' = 1; on the other hand, || w' 1 " 1 ! ||i = 1 (because the w[ Tl ' are all non-negative), implying ||w[ T2 l ||i > 

w[ Tl l||i . In view of (7), this means that ri > T2- 

It follows that the optimal portfolio with non-negative entries obtained by our minimization procedure cor- 
responds to the largest values of r, and thus typically to the sparsest solution (since the penalty term, promoting 
sparsity, is weighted more heavily). This particular portfolio is a minimizer of (2), under the constraints (3) 
and (4), for all r larger than some critical value to- For smaller r the optimal portfolio will contain at least 
one negative weight and will typically become less sparse. However, as in the unconstrained case, this need not 
happen in a monotone fashion. 

Although other optimization methods could be used to compute the sparse portfolios we define, the moti- 
vation behind our choice of a constrained homotopy/LARS algorithm is the fact that we are only interested in 
computing portfolios involving only a small number of securities and that we use the parameter r to tune this 
number. Whereas other algorithms would require separate computations to find solutions for each value of r, a 
particularly nice feature of our LARS-based algorithm is that, by exploiting the piecewise linear dependence of 
the solution on r, it obtains, in one run, the weight vectors for all values of r (i.e. for all numbers of selected 
assets) in a prescribed range. 
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4 Empirical application 



In this section we apply the methodology described above to construct optimal portfolios and evaluate their 
out-of-sample performance. 

We present two examples, each of which uses a universe of investments compiled by Fama and French 5 . In 
the first example, we use 48 industry sector portfolios (abbreviated to FF48 in the remainder of this paper). 
In the second example, we use 100 portfolios formed on size and book-to-markct (FF100). 6 In both FF48 and 
FF100, the portfolios are constructed at the end of each June. 

4.1 Example 1: FF48 

Using the notation of Section 2, r i t is the annualized return in month t of industry i, where i — 1, . . . , 48. We 
evaluate our methodology by looking at the out-of-sample performances of our portfolios during the last 30 
years in a simulated investment exercise. 

Each year from 1976 to 2006, we construct a collection of optimal portfolios by solving an ensemble of 
minimizations of (2) with constraints (3) and (4). For each time period, we carry out our optimization for a 
sufficiently wide range of r so as to produce an ensemble of portfolios containing different numbers of active 
positions; ideally we would like to construct portfolios with K securities, for all values of K between 2 and 
48. As explained below, we do not always obtain all the low values of K; typically we find optimal portfolios 
only for K exceeding a minimal value K min , that varies from year to year. (See Figure 3). To estimate the 
necessary return and covariance parameters, we use data from the preceding 5 years (60 months). At the time 
of each portfolio construction, we set the target return, p, to be the average return achieved by the naive, 
evenly-weighted portfolio over the previous 5 years. 

For example, our first portfolio construction takes place at the end of June 1976. To determine R and /x, we 
use the historical returns from July 1971 until June 1976. We then solve the optimization problem using this 
matrix and vector, targeting an annualized return of 6.60% (p — 0.066), equal to the average historical return, 
from July 1971 until June 1976, obtained by a portfolio in which all industry sectors arc given the equal weight 
1/48. We compute the weights of optimal solutions w^l for r ranging from large to small values. We select 
these portfolios according to some criterion we would like to meet. We could, e.g. target a fixed total number of 
active positions, or limit the number of short positions; see below for examples. Once a portfolio is thus fixed, 
it is kept from July 1976 until June 1977, and its returns are recorded. At the end of June 1977, we repeat the 
same process, using training data from July 1972 to June 1977 to compute the composition of a new collection 
of portfolios. These portfolios are observed from July 1977 until June 1978 and their returns are recorded. The 
same exercise is repeated every year with the last ensemble of portfolios constructed at the end of June 2005. 

Once constructed, the portfolios are thus held through June of the next year and their monthly out-of-sample 
returns are observed. These monthly returns, for all the observation years together, constitute a time series; for 
a given period (whether it is the full 1976^2006 period, or sub-periods), all the monthly returns corresponding 
to this period are used to compute the average monthly return m, its standard deviation a, and their ratio 
m/<7, which is then the Sharpe ratio measuring the trade-off, corresponding to the period, between returns and 
volatility of the constructed portfolios. 

We emphasize that the sole purpose of carrying out the portfolio construction multiple times, in successive 
years, is to collect data from which to evaluate the effectiveness of the portfolio construction strategy. These 
constructions from scratch in consecutive years are not meant to model the behavior of a single investor; they 
model, rather, the results obtained by different investors who would follow the same strategy to build their 
portfolio, starting in different years. A single investor might construct a starting portfolio according to the 
strategy described here, but might then, in subsequent years, adopt a sparse portfolio adjustment strategy such 
as described in Section 5. 

We compare the performance of our strategy to that of a benchmark strategy comprising an equal investment 
in each available security. This 1/N strategy is a tough benchmark since it has been shown to outperform a 
host of optimal portfolio strategies constructed with existing optimization procedures (DeMiguel, Garlappi, and 

5 These data are available from the site 
http: / /mba. tuck. dartmouth. edu/pages/f acuity /ken . f rench/data_library .html, to which we refer for more details on these port- 
folios. 

6 These portfolios are the intersections of 10 portfolios formed on size and 10 portfolios formed on the ratio of book equity to 
market equity. 
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Figure 3: Number of assets without short positions, for FF48. The number of active assets K poa . in w po3 ', the optimal portfolio 
without short positions, from year to year. This number varies from 4 to 11; the average over 30 years is around 6. 



Table 1: Performance of the sparse portfolio with no short-selling, for FF48. 
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Monthly mean return m, standard deviation of monthly return a, 
and corresponding Sharpe ratio S (expressed in %) for the optimal 
no-short-positions portfolio, as well as for the l/W-strategy portfolio. 
Both portfolio strategies are tested for their performance over twelve 
consecutive months immediately following their construction; their 
returns are pooled over several years to compute m, <r and S, as 
described in the text. Statistics reflect the performance an investor 
would have achieved, on average, by constructing the portfolio on July 
1 one year, and keeping it for the next twelve months, until June 30 
of the next year, with the average taken over 5 years for each of the 
break-out periods, over 30 years for the full period. 



Uppal, 2007). To evaluate the 1/iV strategy portfolios for the FF48 assets, we likewise observe the monthly 
returns for a certain period (a 5-year break-out period or the full 30-year period), and use them to compute the 
average mean return m, the standard deviation er, and the Sharpe ratio m/a. 

We carried out the full procedure following several possible guidelines. The first such guideline is to pick 
the optimal portfolio w pos that has only non negative weights Wi, i.e. the optimal portfolio without short 
positions. As shown in Section 3, this portfolio corresponds to the largest values of the penalization constant 
t; it typically is also the optimal portfolio with the fewest assets. Figure 3 reports the number of active assets 
of this optimal no-short-positions portfolio from year to year. This number varies from a minimum of 4 to a 
maximum of 11, with an average of around 6; note that this is quite sparse in a 48 asset universe. Table 1 
reports statistics to evaluate the performances of the optimal no-short-positions portfolio. We give the statistics 
for the whole sample period as well as for consecutive sub-periods extending over 5 years each, comparing these 
to the portfolio that gives equal weight to the 48 assets. The table shows that the optimal no-short-positions 
portfolio significantly outperforms the benchmark both in terms of returns and in terms of volatility; this result 
holds for the full sample period as well as for the sub-periods. Note that most of the gain comes from the 
smaller variance of the sparse portfolio around its target return, p. 

A second possible guideline for selecting the portfolio construction strategy is to target a particular number 
of assets, or a particular narrow range for this number. For instance, users could decide to pick, every year, the 
optimal portfolio that has always more than 8 but at most 16 assets. Or the investor may decide to select an 
optimal portfolio with, say, exactly 13 assets. In this case, we would carry out the minimization, decreasing r 
until we reach the breakpoint value where the number of assets in the portfolio reaches 13. We shall denote the 
corresponding weight vector by w 13 . 

For a "binned" portfolio, such as the 8-to-16 asset portfolio, targeting a narrow range rather than an exact 
value for the total number of assets, we define the portfolio w 8 ~ 16 by considering each year the portfolios w K 
with K between 8 and 16 (both extremes included), and selecting the one that minimizes the objective function 
(1); if there arc several possibilities, the minimizer with smallest l\ norm is selected. The results are summarized 
in Figure 4, which shows the average monthly Sharpe ratio of different portfolios of this type for the entire 30 
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Figure 4: Outperforming the 1/JV strategy, for FF48. The Sharpe ratio, for the full period 1976-2006, for several portfolios: w pos , 
the optimal portfolio without short positions (red); the portfolios w K with a fixed number, K, of active positions, where K ranges 
from 11 to 48 (green dots). w poB is indicated by a fat solid horizontal red bar, stretching from 4 to 11 (its minimum to maximum 
number of assets; see also Figure 3.) The binned portfolios w bl " are marked by horizontal blue bars indicating the range of the 
corresponding bin. The horizontal green line indicates the Sharpe ratio for the 1/N portfolio, in which equal amounts are invested 
in all the assets; a large number of our portfolios significantly outperform this benchmark, with only the least sparse (most weakly 
penalized) choices underpcrforming it. 



Table 2: Empirical results for FF48. 
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Monthly mean return, m, standard deviation of monthly return, a, and corresponding monthly Sharpe ratio S (expressed 
in %) for the optimal portfolios with 8-16, 17-24, 25-32, 33-40, 41-48 assets, as well as (again) the optimal no-short- 
positions portfolio. Portfolios are constructed annually as described in the text. Statistics reflect the performance an 
investor would have achieved on average, by constructing the portfolio on July 1 one year, and keeping it for the next 
twelve months, until June 30 of the next year; the average is taken over several years: 5 for each break-out period, 30 
for the full period. 
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year exercise. For several portfolio sizes, we are able to significantly outperform the evenly-weighted portfolio 
(the Sharpe ratio of which is indicated by the horizontal line at 27%). Detailed statistics are reported in Table 
2; for comparison, Table 2 lists again the results for the no-short-positions portfolio. 

Notice that, according to Table 2, the no-short-positions portfolio outperforms all binned portfolios for the 
full 30 year period; this is not systematically true for the break-out periods, but even in those break-out periods 
where it fails to outperform all binned portfolios, its performance is still close to that of the best performing 
(and sparsest) of the binned portfolios. This observation no longer holds for the portfolio constructions with 
FF100, our second exercise — see Figure 6 below. 

4.2 Example 2: FF100 

Except for using a different collection of assets, this exercise is identical in its methodology to what was done 
for FF48, so that we do not repeat the full details here. Tables 3, 4 and Figures 5, 6 summarize the results; 
they correspond, respectively, to the results given in Tables 1, 2 and Figures 3, 4 for FF48. 
From the results of our two exercises we see that: 

• Our sparse portfolios (with a relatively small number of assets and moderate t) outperform the naive 1 /N 
strategy significantly and consistently over the entire evaluation period. This gain is achieved for a wide 
range of portfolio sizes, as indicated in Figures 4 and 6. ft is to be noted that the best performing sparse 
portfolio we constructed is not always the no-short-positions portfolio. 

• When we target a large number of assets in our portfolio, the performance deteriorates. We interpret 
this as a result of so-called "overfitting" . Larger target numbers of assets correspond to smaller values of 
r. The l\ penalty is then having only a negligible effect and the minimization focuses essentially on the 
variance term. Hence, the solution becomes unstable and is overly sensitive to the estimation errors that 
plague the original (unpenalized) Markowitz optimization problem (1). 



5 Possible generalizations 

In this section, we briefly describe some extensions of our approach. It should be pointed out that the relevance 
and usefulness of the l\ penalty is not limited to a stable implementation of the usual Markowitz portfolio 
selection scheme described in Section 2. Indeed, there are several other portfolio construction problems that 
can be cast in similar terms or otherwise solved through the minimization of a similar objective function. We 
now list a few examples: 
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Figure 5: Number of assets without short positions, for FF100. The number of active assets K poa „ in w pos , the optimal portfolio 
without short positions, from year to year. Note that if poa .(t) is also the minimum value K m i n (t). For this data set, our construction 
generates optimal portfolios for all values from Kp 0a .(t) to 60 in every year t. K pos ,(t) varies from 3 to 13; the average over 30 years 
is 7. 
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Table 3: Performance of the sparse portfolio with no short-selling, for FF100. 
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Monthly mean return m, standard deviation of monthly return cr, 
and corresponding Sharpe ratio S (expressed in %) for the optimal 
positive-weights-only portfolio, as well as for the l/W-strategy port- 
folio. Both portfolio strategies are tested for their performance over 
twelve consecutive months immediately following their construction; 
their returns are pooled over several years to assess their performance, 
as described in the text. Statistics reflect the performance an investor 
would have achieved, on average, by constructing the portfolio on July 
1 one year, and keeping it for the next twelve months, until June 30 
of the next year; the average is taken over 5 years for each of the 
break-out periods, over 30 years for the full period. 



Table 4: Empirical results for FF100. 
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Monthly mean return, m, standard deviation of monthly return, a, and corresponding monthly Sharpe ratio S 
(expressed in %) for the optimal portfolios with 11-20, 21-30, 31-40, 41-50, 51-60 assets, as well as (again) the 
optimal portfolio without short positions. Portfolios are constructed annually as described in the text. Statistics 
reflect the performance an investor would have achieved on average, by constructing the portfolio on July 1 one 
year, and keeping it for the next twelve months, until June 30 of the next year; the average is taken over several 
years: 5 for each break-out period, 30 for the full period. 
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Fi gure 6; Outperforming the 1/7V strategy for FF100. The Sharpe ratio, for the full period 1976-2006, for several portfolios: w pos ', 
the optimal portfolio without short positions (red); the "binned" portfolios w 11— 20 , w 21-30 , w 31— 40 , w 41— 50 and w 51— 60 (blue); 
and the portfolios w K with a fixed number, K, of active positions, where K ranges from 13 to 60 (green dots circled in black). 
w pos. j s m( Ji ca ted by a fat solid horizontal red bar, stretching from 3 to 13 (its minimum to maximum number of assets; see also 
Figure 5.) In this example, optimal sparse portfolios that allow short positions significantly outperform both the evenly-weighted 
portfolio and the optimal no-short-positions portfolio. 
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5.1 Partial index tracking 

In many situations, investors want to create a portfolio that efficiently tracks an index. In some cases, this will 
be an existing financial index whose level is tied to a large number of tradable securities but which is not yet 
tradable en masse as an index future or other single instrument. In such a situation, investors need to find a 
collection of securities whose profit and loss profile accurately tracks the index level. Such a collection need not 
be a full replication of the index in question; indeed it is frequently inconvenient or impractical to maintain a 
full replication. 

In other situations, investors will want to monetize some more abstract financial time series: an economic 
time series, an investor sentiment time series, etc. In that case, investors will need to find a collection of 
securities which is likely to remain correlated to the target time series. 

Either way, the investor will have at his disposal a time scries of index returns, which we will write as a 
T x 1 column vector, y. Also, the investor will have at his disposal the time scries of returns for every available 
security, which we will write as a T x N matrix R, as before. 

In that case, an investor seeking to minimize expected tracking error would want to find 

w = argmin ||y — -Rwl^. 

w 

However, this problem is simply a linear regression of the target returns on the returns of the available assets. 
As the available assets may be collinear, the problem is subject to the same instabilities that we discussed above. 
As such, we can augment our objective function with an t\ penalty and seek instead 

wM = argmin [||y-iJw||| + r||w||i] , 

subject to the appropriate constraints. This simple modification stabilizes the problem and enforces sparsity, 
so that the index can be stably replicated with few assets. 

Moreover, one can enhance this objective function in light of the interpretation of the i\ term as a model of 
transaction costs. Let Sj is the transaction cost (bid-ask spread) for the ith security. In that case, we can seek 

\\y -Rw\\ 2 2 + Tj2^\w t \ ■ 

i 

By making this modification, the optimization process will "prefer" to invest in more liquid securities (low Sj) 
while it will "avoid" investments in less liquid securities (high Sj). A slightly modified version of the algorithm 
described above can cope with such weighted l\ penalty and generate a list of portfolios for a wide range of values 
for t. For each portfolio, the investor could then compare the expected tracking error per period (-^ ||y — iJwHl) 
with the expected cost of creating and liquidating the tracking portfolio Si|u>,|). The investor could then 
select a portfolio that suits both his risk tolerance and cost constraints. 



w' r l = argmin 

w 



5.2 Portfolio hedging 

Consider the task of hedging a given portfolio using some subset of a universe of available assets. As a concrete 
example, imagine trying to efficiently hedge out the market risk in a portfolio of options on a single underlying 
asset, potentially including many strikes and maturities. An investor would be able to trade the underlying 
asset as well as any options desired. In this context, it would be possible to completely eliminate market risk by 
negating the initial position. However, this may not be feasible given liquidity (transaction cost) constraints. 

Instead, an investor may simply want to reduce his risk in a cost-efficient way. One could proceed as follows: 
Generate a list of scenarios. For each scenario, determine the change in the value of the existing portfolio. Also, 
determine the change in value for a unit of each available security. Store the former in a M x 1 column vector y 
and store the latter in a M x N matrix, X. Also, determine a probability, Pi for i = 1, . . . , M of each scenario, 
and store the square root of these values in a diagonal M x M matrix, P. These probabilities can be derived 
from the market or assumed subjectively according to an investor's preference. As before, denoting by Si the 
transaction costs for each tradable security, we can seek 



w' r l = argmin 



|P(y + Xw) ||| + t^s;K 
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As before, the investor could then apply one of the algorithms above to generate a list of optimal portfolios for 
a wide range of values of r. Then, just as in the index tracking case, the investor could observe the attainable 
combinations of expected mark to market variance (\\P (y + Xw) Hi) and transaction cost Si\wi\). One 
appealing feature of this method is that it does not explicitly determine the number of assets to be included in 
the hedge portfolio. The optimization naturally trades off portfolio volatility for transaction cost, rather than 
imposing an artificial cap on either. 

5.3 Portfolio Adjustment 

Thus far, we have assumed that investors start with no assets, and must construct a portfolio to perform a 
particular task. However, this is rarely the case in the real world. Instead, investors frequently hold a large 
number of securities and must modify their existing holdings to achieve a particular goal. In this context, the 
investor already holds a portfolio w and must make an adjustment A w . In that case, the final portfolio will be 
w + A w , but the transaction costs will be relevant only for the adjustment A w . The corresponding optimization 
problems is given by 

AM = argmin[||pl T -il(w + A w )||2+T||A w || 1 ] 

s. t. AlJi = 
A^Iat = 0. 

It is easy to modify our methodology to handle this situation. 

6 Conclusion 

We have devised a method that constructs stable and sparse portfolios by introducing an t\ penalty in the 
Markowitz portfolio optimization. We obtain as special cases the no-short-positions portfolios which also com- 
prise few active assets. To our knowledge, such a sparsity property of the non-negative portfolios has not 
been previously noticed in the literature. The portfolios we propose can be seen as natural extensions of the 
no-short-positions portfolios and maintain or improve their performances while preserving as much as possible 
their sparse nature. 

We have also described an efficient algorithm for computing the optimal, sparse portfolios, and we have 
implemented it using as assets two sets of portfolios constructed by Fama and French: 48 industry portfolios 
and 100 portfolios formed on size and book-to-market. We found empirical evidence that the optimal sparse 
portfolios outperform the evenly-weighted portfolios by achieving a smaller variance; moreover they do so with 
only a small number of active positions, and the effect is observed over a range of values for this number. This 
shows that adding an l\ penalty to objective functions is a powerful tool for various portfolio construction 
tasks. This penalty forces our optimization scheme to select, on the basis of the training data, few assets 
forming a stable and robust portfolio, rather than being "distracted" by the instabilities due to collinearities 
and responsible for meaningless artifacts in the presence of estimation errors. 

Many variants and improvements are possible on the simple procedure described and illustrated above. This 
goes beyond the scope of the present paper which was to propose a new methodology and to demonstrate its 
validity. In future work, we plan to extend our empirical exercises to other and larger asset collections (such as 
S&P500), to explore other performance criteria than the usual Sharpe ratio, and to derive automatic procedures 
for choosing the number of assets to be included in our portfolio. We believe that the good regularization 
properties of our method should ensure its robustness against these various changes. 

Acknowledgments 

We thank Tony Berrada, Laura Coroneo, Simone Manganelli, Sergio Pastorcllo, Lucrezia Reichlin and Olivier 
Scaillet for helpful suggestions and discussions. Part of this research has been supported by the "Action de 
Recherche Concertee" Nb 02/07-281 (CDM and DG), the Francqui Foundation (IL), the VUB-GOA 62 grant 
(ID, CDM and IL), the National Bank of Belgium BNB (DG and CDM), and by the NSF grant DMS-0354464 
(ID). Joshua Brodic thanks Bobray Bordelon, Jian Bai, Josko Plazonic and John Vincent for their valuable 



13 



technical assistance during his senior thesis work, out of which grew his and ID's participation in this project. 
The opinions in this paper are those of the authors and do not necessarily reflect the views of the European 
Central Bank. 

References 

Bertero, M., and P. BOCCACCI (1998): Introduction to Inverse Problems in Imaging. Institute of Physics 
Publishing, London. 

Brodie, J., I. Daubechies, C. De Mol, and D. Giannone (2007): "Sparse and Stable Markowitz portfo- 
lios," Preprint arXiv:0708.0046vl; http://arxiv.org/abs/0708.0046. 

Chen, S. S., D. Donoho, and M. Saunders (2001): "Atomic Decomposition by Basis Pursuit," SIAM 
Review, 43, 129-159. 

Daubechies, I., M. Defrise, and C. De Mol (2004): "An Iterative Thresholding Algorithm for Linear 
Inverse Problems With a Sparsity Constraint," Communications on Pure and Applied Mathematics, 57, 
1416-1457. 

De Mol, C, D. Giannone, and L. Reichlin (2006): "Forecasting Using a Large Number of Predictors: Is 
Bayesian Regression a Valid Alternative to Principal Components?," CEPR Discussion Papers 5829, forth- 
coming in " Journal of Econometrics" . 

DeMiguel, V., L. Garlappi, F. J. Nogales, and R. Uppal (2007): "A Generalized Approach to Portfolio 
Optimization: Improving Performance By Constraining Portfolio Norms," Preprint July 2007. 

DeMiguel, V., L. Garlappi, and R. Uppal (2007): "Optimal versus Naive Diversification: How Inefficient 
Is the 1/N Portfolio Strategy?," Preprint January 2007, forthcoming in Review of Financial Studies. 

Efron, B., T. Hastie, I. Johnstone, and R. Tibshirani (2004): "Least Angle Regression," Annals of 
Statistics, 32, 407-499. 

Jagannathan, R., and T. Ma (2003): "Risk Reduction in Large Portfolios: Why Imposing the Wrong 
Constraints Helps," Journal of Finance, 58(4), 1651-1684. 

Markowitz, H. (1952): "Portfolio Selection," Journal of Finance, 7, 77-91. 

Osborne, M. R., B. Presnell, and B. A. Turlach (2000a): "A New Approach to Variable Selection in 
Least Squares Problems," IMA Journal of Numerical Analysis, 20(3), 389-403. 

(2000b): "On the LASSO and Its Dual," Journal of Computational and Graphical Statistics, 9(2), 

319-337. 

Tibshirani, R. (1996): "Regression Shrinkage and Selection via the Lasso," Journal of the Royal Statistical 
Society Series B, 58, 267-288. 



Appendix: Constrained minimization algorithm 

Before discussing our solution method for the linearly constrained ^-penalized least-squares problem, we briefly 
recall the homotopy/LARS method which manages to recover the unconstrained minimizer of the ^-penalized 
least-squares objective function 

w(t) = argmin [\\Rw - y\\% + r||w||i] 

w 

for a whole range of values of the (positive) penalty parameter r. 
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The variational equations describing the minimizer w(r) are: 

(R T (y-Rw)) l = ^sgn Wl w t ^0 (8) 

\{R T {y-Rw))i\ < T - Wi = 0. (9) 

The minimizer w(r) is a continuous piecewise linear function of r. We shall denote the breakpoints by To > 
T\ > . . . and the corresponding minimizers by w(to), w(ti), . . . The breakpoints occur where a new component 
enters or leaves the support of w(r). We will use b to denote the residual b = R T (y — Rw). 

The homotopy/LARS method for solving these equations starts by considering the point w = 0, which 
satisfies the equations (8,9) for all t > t = 2max^ \(R T y)i\. Hence w(t > r ) = 0. 

Given a breakpoint w(t„), it is possible to construct the next breakpoint w(r„-|-i) by solving a small linear 
system. Let J = {i for which |bi| = t„/2} (i.e. the set of maximal residual), Rj the submatrix consisting of 
the columns J of R. We define the walking direction u by 

RjRj uj = sgn (b, 7 ) 

and Ui = for i £ J (sgn (bj) denotes the vector (sgn(bj)j e j)). In this way, a step w — > w + 711 results in a 
change in the residual b — > b 7V, where Vj = sgn(fej) for j E J. In other words, the maximal components of 
the residual decrease at the same rate. The step size 7 > is now determined to be the smallest number for 
which the absolute value of a component \bi — ^Vi\ (with i £ J) of the new residual becomes equal to \bj — jVj\ 
for j € J (i.e. a new component joins the maximal residual set), or for which a nonzero component of w is 
turned into zero. 

The new penalty parameter is then t„ + i = r„ — 27 (which is smaller than r„), and the corresponding 
minimizer is w(t„ + i) = w(t„) +7U. By construction it is guaranteed to satisfy the variational equations (8,9). 

The two main advantages of this method are thus that it is exact (in particular zero components are really 
zero) and that it yields the breakpoints (and hence the minimizers) for a whole range of values of the penalization 
parameters r > T stop > 0. At each step, only a relatively small linear system has to be solved. If this procedure 
is carried through until the end, one finds lim r ^ arg min ||i?w — y\\\ + t|| w|| 1 = arg min ll w ||i- 

w s.t. R T Rw=R T y 

For the constrained case, i.e. the minimization problem 

w(r)=argmin [||.Rw — y||| + t||w||i] (10) 

w s.t. Aw— a 

subject to the linear constraint Aw = a, we can devise a similar procedure. We assume, of course, that the 
constraint Aw = a has a solution. 

An approximation of the minimizer w(r) can be obtained by applying the unconstrained procedure described 
above to the objective function 

^(r £ ) = argmin [\\Aw - a||| + e\\Rw - y\\ 2 2 + t £ ||w||i] . (11) 

w 

For sufficiently small e, this will give a good approximation of the constrained minimizer w(r) corresponding 
to the penalty r = r e /e (after first going through a number of breakpoints for which Aw ^ a, not even 
approximately). However, this is clearly an approximate method (often very good) whereas the unconstrained 
procedure did not involve any approximation. 

We solve this issue, and provide an exact method, by solving the minimization problem (11) up to the first 
order in e. In this approach e is a small formal positive parameter. Now the minimizer w(r e ) and r £ both 
depend on e. We can write w = w( 0) + ew' 1 ' + 0(e 2 ) and r e = + r^e + 0(e 2 ). We again follow the 
procedure for the unconstrained method, but take care to use arithmetic (addition, multiplication, comparison, 
. . . ) up to first order in e. 

As before, one starts from w = 0, corresponding to a large initial value of r £ , and follows the path of 
descending r £ . The strategy consists of satisfying the variational equations 

(A T ( a -Aw) + eR T (y-Rw)) i = ^sgriw, w { + (12) 

|(^ T (a-Aw) + ej R T (y-/?w)), t | < T -L Wl = Q (13) 
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at each breakpoint by carefully determining a walking direction u = u^ ' + u^e + 0(e 2 ) and a step length 
7 = 7^°) + 7^6 + 0(e 2 ). Using w = w^ ' + ew' 1 ' + 0(e 2 ), we can rewrite equations (12) as 



(A T (a-Aw<°>)) = ^sgnwi (14) 
-A T Aw« + R T {y-Rw^)) = I^sgnw,. (15) 

/ i 2 



( 



From a known breakpoint w we can proceed to the following breakpoint by a step direction u and step size 7 
(both depending on e). We again set 



arg max 



(V(a - Aw<°>) + e{-A T Aw^ + R T {y - flw (0) ))) 
As long as ^ 0, the components J of u are determined by 

^.7 J ^ u « J - sgn( bj ) J ^ 

and the other components of u remain zero. The step size 7 is again determined as before, i.e. when a new 
component enters the maximal residual set, or when a component leaves the active set. The penalty parameter 
r e decreases as before: r £ — > r e — 2j. 

At some point in this procedure, r £ will become zero in zeroth order: r £ = 0+T^e-\-O(e 2 ). The corresponding 
minimizer (more precisely the zeroth-order part of this breakpoint) will satisfy the constraint Aw = a and we 
will have found the first constrained minimizer w of (10), corresponding to tq — t^ 1 -* (i.e. the first-order part of 
the parameter r e of the e-dependent problem at this breakpoint). In the unconstrained case, no such calculations 
were necessary as the starting point was always equal to 0. Similarly to the unconstrained case, we have that 
w(r > r ) = w(t ). 

In principle, one could continue the e-dependent algorithm, but now that the first breakpoint of w(r) is 
determined, it is more advantageous to continue the descent of t by introducing Lagrange multipliers A for the 
problem (10): 

w(t) = arg min [||i?w - y|| 2 + t||w||i + 2X T (Aw - a)] . 

A, w s.t. Aw— a 

This minimization problem (analogous to (10)) amounts to solving the equations: 

(R T (y-Rw)+A T X)i = ^sgnw t (17) 

\(R T (y-Rw)+A T X) i \ < T - w t =0 (18) 

Aw = a. (19) 

Equation (17) is the equivalent of equation (15) whereas equation (19) replaces equation (14). We now already 

have To, w(r > t^) and the initial Lagrange multipliers A = — Aw ' (from the first-order part of the last step 
of the e-dependent problem) . 

To proceed from one breakpoint to the next (w — > w + 711, and A — > A + 7s as the multipliers also change), 
we again need to solve a linear system: 



RjjRj AJ \f uj\ f sgn(b J ) 
Aj M s J \ 



(20) 



with b = R T (y — Rw) + A T X. This will guarantee that w — > w + 711 and A — > A + 7s still satisfy the constraint 
(19) and the variational equations (17,18). The step size 7 is determined by the same rule as before: stop when 
a new component enters the set J = arg max^ | (R T (y — Rw) + A T A) j | or when a nonzero component of w is set 
to zero. Notice the differences and similarities between the linear systems (20) and (16). 

At each breakpoint, this algorithm provides the penalty t„, the corresponding minimizer w(r„) and the 
Lagrange multipliers A„. Unlike for the unconstrained case, it is now possible that w(r) remains constant 
between two breakpoints (i.e. only the Lagrange multipliers A change). 
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One simplifying assumption (not solved in the homotopy/LARS algorithm) was made in the above descrip- 
tion of the algorithm: if the set of maximal residual and the support set differ by more than one component, 
one should carefully select the correct new components to enter the support. This can be done by using the 
variational equations, and our implementation handles this case. 

One could argue that the starting point (i.e. the first breakpoint) for the constrained minimization problem 
is simply given by w(ro) = argmin || w || i , which could be calculated by letting the unconstrained solution 

w s.t. Aw— a 

procedure run its course: w(to) = lim CT ^o argmin w [\\Aw — a|[| + er||w||i] . Gcncrically (i.e. excluding special 
cases), this is correct. However, the problem is that sometimes the minimizer argmin w s t . Aw=a || w l|i is n °t 
unique. In that case, the starting point for the constrained minimizer is not solely determined by A and a but 
also by R and y. In this case, the e-dependent algorithm still chooses the correct starting point from the set 
arg min ||w||i. This is important to mention because the special constraint J^Wi = 1 used in this paper, 

w s.t. Aw— a 

gives rise to such cases. 

Our algorithm is well-suited for the portfolio problems discussed in this paper. The size of the matrix, the 
number of constraints (just two) and, more importantly, the number of nonzero weights in the portfolios are 
such that a minimization run (i.e. finding the minimizer for a whole range of penalty parameters) can be done 
in a fraction of a second on a standard desktop. 

We calculated the portfolio examples in this paper using both the formal e approach (in Mathematica) and 
the approximate small e approach (in Matlab). The outcomes were always consistent. 
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